Confusion Matrix-based Feature Selection

نویسندگان

  • Sofia Visa
  • Brian Ramsay
  • Anca L. Ralescu
  • Esther van der Knaap
چکیده

This paper introduces a new technique for feature selection and illustrates it on a real data set. Namely, the proposed approach creates subsets of attributes based on two criteria: (1) individual attributes have high discrimination (classification) power; and (2) the attributes in the subset are complementary that is, they misclassify different classes. The method uses information from a confusion matrix and evaluates one attribute at a time.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Online Streaming Feature Selection Using Geometric Series of the Adjacency Matrix of Features

Feature Selection (FS) is an important pre-processing step in machine learning and data mining. All the traditional feature selection methods assume that the entire feature space is available from the beginning. However, online streaming features (OSF) are an integral part of many real-world applications. In OSF, the number of training examples is fixed while the number of features grows with t...

متن کامل

Computational Intelligence Optimization Algorithm Based on Meta-heuristic Social-Spider: Case Study on CT Liver Tumor Diagnosis

Feature selection is an importance step in classification phase and directly affects the classification performance. Feature selection algorithm explores the data to eliminate noisy, redundant, irrelevant data, and optimize the classification performance. This paper addresses a new subset feature selection performed by a new Social Spider Optimizer algorithm (SSOA) to find optimal regions of th...

متن کامل

Verdict Accuracy of Quick Reduct Algorithm for Gene Expression Data

Gene expression data are the number of training samples is very small compared to the large number of genes involved in the experiments, that gene selection results, the cost of biological experiment and decision can be greatly reduced by analyzing only the marker genes. Since dealing with high dimensional data is computationally complex and sometimes even intractable, recently several feature ...

متن کامل

Genetic Algorithm and Confusion Matrix for Document Clustering

Text mining is one of the most important tools in Information Retrieval. Text clustering is the process of classifying documents into predefined categories according to their content. Existing supervised learning algorithms to automatically classify text requires sufficient documentation to learn exactly. In this paper, Niching memetic algorithm and Genetic algorithm (GA) is presented in which ...

متن کامل

Support vector machines combined with feature selection for breast cancer diagnosis

Breast cancer is the second largest cause of cancer deaths among women. At the same time, it is also among the most curable cancer types if it can be diagnosed early. Research efforts have reported with increasing confirmation that the support vector machines (SVM) have greater accurate diagnosis ability. In this paper, breast cancer diagnosis based on a SVM-based method combined with feature s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011